[ML] Adding asynchronous start up logic for the inference API internals #135462

jonathan-buttner · 2025-09-25T17:45:25Z

This PR adds functionality to allow for an asynchronous version of the startup logic within the inference API. We haven't yet seen problems with doing this synchronously. Doing it async makes changes to the dynamic preconfigured inference endpoints changes a little bit easier.

This PR also adds SubscribableListener in a few places to make the flow easier since we're relying on a listener for the applicable methods instead of blocking.

…-sender-init

jonathan-buttner · 2025-09-25T17:52:38Z

.../elasticsearch/xpack/inference/services/amazonbedrock/client/AmazonBedrockRequestSender.java

    }

+    /**
+     * TODO implement this functionality to ensure that we don't block node bootups


Converting bedrock is going to take a little more work. Probably best to do this in a separate PR because this one is already 50 files 😬

jonathan-buttner · 2025-09-25T17:54:24Z

...rence/services/elastic/authorization/ElasticInferenceServiceAuthorizationRequestHandler.java


-            var requestMetadata = extractRequestMetadataFromThreadContext(threadPool.getThreadContext());
-            var request = new ElasticInferenceServiceAuthorizationRequest(baseUrl, getCurrentTraceInfo(), requestMetadata);
+            SubscribableListener.newForked(sender::startAsynchronously).<InferenceServiceResults>andThen((authListener) -> {


Now we're doing an async start and then once that completes we do the rest of the functionality as normal.

…asticsearch into ml-async-sender-init

elasticsearchmachine · 2025-09-26T12:17:19Z

Pinging @elastic/ml-core (Team:ML)

DonalEvans · 2025-09-26T17:17:05Z

.../src/main/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSender.java

+    @Override
+    public void startSynchronously() {
+        if (started.compareAndSet(false, true)) {
+            startInternal(ActionListener.noop());


Will this cause any exception thrown in startInternal() to be ignored when doing a synchronous start? Also, do we need to make sure that we always call waitForStartToComplete() before returning from this method? If someone calls startAsynchronously() then another thread immediately calls startSynchronously(), the second call will return immediately (because we already set started to true) but the sender won't actually have started yet.

Good point, I'll make those changes.

DonalEvans · 2025-09-26T17:29:13Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/SenderService.java

-        init();
-        doStart(model, listener);
+        SubscribableListener.newForked(this::init)
+            .<Boolean>andThen((doStartListener) -> doStart(model, doStartListener))


What is the purpose of calling doStart() here? It seems to be a no-op that just immediately returns.

The idea is that it can be overridden by child classes. In reality I don't think any actually override it yet. The Elasticsearch integration does use it but that doesn't extend from SenderService.

Gotcha, thanks for the explanation

DonalEvans · 2025-09-26T18:13:54Z

...test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java

+            sender.startSynchronously();
+            sender.startSynchronously();
+            sender.startSynchronously();


It would be good to add some tests for the startAsynchronously() method, since it's a distinct implementation from startSynchronously(). Also, a test that calling startAsynchronously() followed immediately by startSynchronously() behaves the way we expect would be good.

DonalEvans · 2025-09-26T18:19:13Z

...sticsearch/xpack/inference/services/amazonbedrock/client/AmazonBedrockMockRequestSender.java


+    @Override
+    public void startAsynchronously(ActionListener<Void> listener) {
+        listener.onResponse(null);


Would it be better to have this method throw the same UnsupportedOperationException as AmazonBedrockRequestSender.startAsynchronously()? It would be nice to have a little extra confidence that we're not calling an unsupported method.

DonalEvans · 2025-09-26T18:20:45Z

.../elasticsearch/xpack/inference/services/amazonbedrock/client/AmazonBedrockRequestSender.java

    @Override
-    public void start() {
+    public void startAsynchronously(ActionListener<Void> listener) {
+        throw new UnsupportedOperationException("not implemented");


Would it be worth wrapping this throw in a check on the value of started? If the sender has already been started, then calling startAsynchronously() should have no effect.

Hmm I think in that situation we should still throw. It would be a bug if we're ever calling that method for AmazonBedrockRequestSender.

DonalEvans · 2025-09-26T18:30:01Z

.../services/elastic/authorization/ElasticInferenceServiceAuthorizationRequestHandlerTests.java

    }

-    @SuppressWarnings("unchecked")
-    public void testGetAuthorization_OnResponseCalledOnce() throws IOException {


Why was this test deleted?

Yeah sorry I meant to comment on this and forgot 😅 . I'll add it back, it was giving me problems because we're mocking the listener but I think I found a way to fix it.

…-sender-init

DonalEvans · 2025-09-29T19:35:40Z

...test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java

+            // Checking for both exception types because there's a race condition between the Error being thrown on a separate thread
+            // and the startCompleted latch timing out waiting for the start to complete


HttpRequestSender.startInternal(), only catches and handles Exception, so any Error thrown in that method will always escape and cause the listener to not be invoked, meaning that the maybeDieOnAnotherThread() call never happens, and neither does the waitForStartToComplete() call in startSynchronously(), so we wouldn't ever expect to see the IllegalStateException get thrown from waitForStartToComplete().

If I change to test to use an IllegalArgumentException wrapping an Error, then the listener is invoked and we always get the IllegalStateException thrown from startSynchronously() due to timing out waiting for the sender to start. However, with that change, the test fails due to the error being thrown in another thread. I don't know how to tell a test to expect an exception to be thrown in another thread, but it looks like CloseFollowerIndexIT.wrapUncaughtExceptionHandler() might be trying to solve the same problem.

I wonder if we need to rethrow the Error at all in the case where we catch an Exception with an Error as one of its causes, or just log it and allow the waitForStartToComplete() call to inevitably time out?

Good catch.

I wonder if we need to rethrow the Error at all in the case where we catch an Exception with an Error as one of its causes, or just log it and allow the waitForStartToComplete() call to inevitably time out?

Yeah I think I'm going to just log it and rely on the waitForStartToComplete(). After we refactor bedrock, I'm pretty sure we can remove the startSynchronously() all together or just use it for tests.

DonalEvans · 2025-09-29T20:05:36Z

...test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java

+        }
+    }
+
+    public void testCreateSender_CanCallStartAsyncMultipleTimes() throws Exception {


This test and the one below it could be improved a little by verifying that no matter how many times we call startAsynchronously() or startSynchronously(), we only call HttpClientManager.start() once:

var clientManagerSpy = spy(clientManager); var senderFactory = new HttpRequestSender.Factory(createWithEmptySettings(threadPool), clientManagerSpy, mockClusterServiceEmpty()); ... for (int i = 0; i < asyncCalls; i++) { PlainActionFuture<Void> listener = listenerList.get(i); assertNull(listener.actionGet(TIMEOUT)); } verify(clientManagerSpy, times(1)).start();

It would also be nice if we could verify that we're calling waitForStartToComplete() the expected number of times.

DonalEvans · 2025-09-29T20:08:16Z

.../src/main/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSender.java

+        // Handle the case where start*() was already called and this would return immediately because the started flag is already true
+        waitForStartToComplete();


I'm wondering if we need to do something similar for async calls, since if two async calls come in one after the other, the second one will complete immediately even if the first one hasn't finished starting the sender yet.

Good idea, I tried to come up with a solution that would avoid having to do spin up a thread to then call the waitForStartToComplete since most of the time it will simply return.

…-sender-init

jonathan-buttner added 5 commits September 19, 2025 17:03

Refactoring init to be async

5833e3e

Fixing bedrock tests

c77289e

Fixing more tests

ac3f29c

Fixing tests

e3c9103

Merge branch 'main' of github.com:elastic/elasticsearch into ml-async…

de0251e

…-sender-init

jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Sep 25, 2025

[CI] Auto commit changes from spotless

a901dc6

jonathan-buttner commented Sep 25, 2025

View reviewed changes

jonathan-buttner and others added 4 commits September 25, 2025 14:25

Fixing typo

a5d9a4e

Merge branch 'ml-async-sender-init' of github.com:jonathan-buttner/el…

67cd5c4

…asticsearch into ml-async-sender-init

Adding more notes on bedrock

7b802ce

Merge branch 'main' into ml-async-sender-init

1c6d667

jonathan-buttner marked this pull request as ready for review September 26, 2025 12:16

DonalEvans reviewed Sep 26, 2025

View reviewed changes

jonathan-buttner and others added 5 commits September 26, 2025 16:35

Addressing feedback

0ea22a0

Adding exception handling

ff26624

Merge branch 'main' of github.com:elastic/elasticsearch into ml-async…

3779c49

…-sender-init

[CI] Auto commit changes from spotless

3277d23

Merge branch 'main' into ml-async-sender-init

ef179b3

jonathan-buttner requested a review from DonalEvans September 29, 2025 19:47

DonalEvans reviewed Sep 29, 2025

View reviewed changes

jonathan-buttner added 3 commits September 30, 2025 15:01

Refactoring async start

8d40454

rename

2110e95

Merge branch 'main' of github.com:elastic/elasticsearch into ml-async…

0eba997

…-sender-init

jonathan-buttner requested a review from DonalEvans September 30, 2025 21:19

DonalEvans approved these changes Sep 30, 2025

View reviewed changes

jonathan-buttner merged commit 940e8c8 into elastic:main Oct 1, 2025
35 checks passed

		// Checking for both exception types because there's a race condition between the Error being thrown on a separate thread
		// and the startCompleted latch timing out waiting for the start to complete

		// Handle the case where start*() was already called and this would return immediately because the started flag is already true
		waitForStartToComplete();

[ML] Adding asynchronous start up logic for the inference API internals #135462

[ML] Adding asynchronous start up logic for the inference API internals #135462

Uh oh!

Conversation

jonathan-buttner commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Sep 26, 2025

Uh oh!

DonalEvans Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jonathan-buttner commented Sep 25, 2025 •

edited

Loading

DonalEvans Sep 26, 2025 •

edited

Loading